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Abstract — In this article we introduce the concept and the first implementation of a lightweight client-server-framework as middleware 
for distributed computing. On the client side an installation without administrative rights or privileged ports can turn any computer into a 
worker node. Only a Java runtime environment and the JAR files comprising the workflow client are needed. To connect all clients to the 
engine one open server port is sufficient. The engine submits data to the clients and orchestrates their work by workflow descriptions 
from a central database. Clients request new task descriptions periodically thus the system is robust against network failures. In the 
basic set-up, data up- and downloads are handled via HTTP communication with the server. The performance of the modular system 
could additionally be improved using dedicated file servers or distributed network file systems. 

We demonstrate the design features of the proposed engine in real-world applications from mechanical engineering. We have used 
this system on a compute cluster in design-of-experiment studies, parameter optimisations and robustness validations of finite element 
structures. 

Index Terms — Java grid engine, workflow automation, distributed services, parameter optimisation, design-of-experiments 
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1 Introduction 

NOWADAYS not only engineers and computer scien- 
tists use various simulation codes and data pre- 
and post-processing tools for almost every task during 
product development or experimental design. In the 
financial sector automated data analysis and predictions 
from numerical models are common. Integrating consec- 
utive tasks from distributed systems in workflows frees 
the user from the need to transfer data and to invoke 
applications in due time. 

During the last decade middleware such as Globus 
Toolkit [1], [2], [3], Unicore [4], [5], and gLite [6], [7] 
was developed in grid projects. These tools mainly focus 
on service-oriented grid infrastructures with seamless 
and secure access to data sources for users without 
experience in grid computing. The middlewares expose 
applications via web services [8] to workflow managers 
like Taverna workbench [9], [10] enabling users to design 
and execute complex workflows in graphical interfaces 
and monitor the operations remotely. Due to the inherent 
complexity of abstract database interfaces and multi- 
ple middleware layers specialists are needed to install, 
configure, and test these tools. The specialist must be 
supported by the local administrators of the worker 
nodes as various tasks require administrative rights. 
Users without programming experience cannot deploy 
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data streams in graphical interfaces unless the access 
to data and applications is encapsulated by particular 
service wrappers. In our experience with scenarios like 
this developers spent most of the time setting up and de- 
bugging the diverse tools. The construction of workflows 
via graphical user interfaces, however, is not flexible or 
efficient enough to save a reasonable amount of time. 

As a typical example from mechanical engineering we 
have chosen numerical simulations where input decks 
containing finite element models have to pass through 
several workflow stages. At each stage an application 
processes the resulting data from a previous stage to 
become the input data of the subsequent one. In real- 
world scenarios we face many different applications 
and various restrictions, e.g. a CAD program to create 
a modified model only runs on a dedicated Windows 
system, while the meshing generator uses an old but 
reliable Unix system, and the finite element simulation 
code runs on a Linux cluster without sufficient licences 
for all nodes being available. To expose a single service 
it would not make sense to install Globus Toolkit, Uni- 
core, or even gLite on all these nodes. But how can a 
bunch of legacy software in a highly heterogeneous and 
distributed environment be orchestrated without having 
administrative rights for the machines? 

2 The Engine 

In the framework we cover below, we have inverted 
the standard scenario of grid computing where users 
act as clients contacting services on distributed hosts. 
Instead of wrapping services on every host by com- 
plex middleware for accounting, workflow management, 
and job control we developed a lightweight grid client 
that can be deployed without installation. This client 
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parameters. For each parameter a column in the cor- 
responding job table for this type of application has 
to be created. The table columns are the attributes of 
the embedded Parameter elements. The database table 
Download contains paths to the input decks for the Ansys 
simulations in the server file system: 



from Download; 



Fig. 1. The framework in a simple scenario: tire server 
engine utilises a database for workflow descriptions and 
a web server for the communication with the clients. 
Applications are wrapped by lightweight Java clients peri- 
odically querying the server engine for new tasks. 
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If available, the client uses the file length, the last- 
modified time stamp, and the MDSSum [16] to verify 
the integrity of downloaded files. Repeated downloads 
of identical files can be avoided by comparing them to a 
local file cache. Table JobAnsys contains the parameters 
for a wrapper class with the identical name starting an 
Ansys simulation. The job wrappers will be discussed 
later in section 2.2. Here the input deck filename is the 
only parameter to the Ansys job wrapper: 

select * from JobAnsys; 



JobID 


Input 




Timeout 


11 


inputdeokJOl 


dat 


1000 


12 


inputdeck_002 


dat 


1000 



program turns applications on different hosts into re- 
mote controllable services. For each application a generic 
wrapper is extended to deal with the application-specific 
parameters and to create and start suitable batch scripts. 
New wrappers can be added to the system running 
without restart. At the central host a servlet container, 
i.e. Apache Tomcat, processes the communication of all 
clients [11], [12]. A relational database maintains work- 
flow descriptions and parameters for the applications 
[13], [14], see Figure 1. 

+ + + + 

2 1 Distributed workflows generic parameter Timeout is available for all 

kind of jobs, it is particularly useful for tests during 

In an exemplary workflow the client downloads data .i it /: i n j. 

t' ttt T' d'f construction of new workflows. Once set, runnmg 

an processes it, i.e. it starts an app ication e itmg • ^^.^ automatically aborted after a previously de- 

the data. The client monitors the ongoing process and fj^^^ .^^ ^^^^ ^^^^^ ^000 seconds. The next job 

uploads the results after the application has finished jobParseAnsysEigenf req parses the Ansys output and 

Further processing of the data resulting can be delegated ^^^^^^^ f.^^^ Freqf ile and Modef ile for natural fre- 

to applications on other clients. The clients communicate ^^^^^^^ j^^^e shapes. The Ansys output filename is 

with a servlet container on the central host accessing an , , ..uj lu tujjru 

& not used as a third parameter because the default name 
URL with their logical client name as parameter, e.g.: • i j u j j j • ..u • 

IS already hardcoded m the parser script: 

http: //server: 8180/engine/Tasks?Client=node-01 ^^^^^^ , from jobParseAnsysEigenfreq; 

+ + H + + 

The underlying servlet Tasks queries the server database 

for open tasks for this client node-Ol and generates an i hi eigenmode.asc i eigentreq.aso i 120 1 

^ 1 n i-T ..1 .1 in 1 ... r .1 ' eigenmode.asc I eigenf req. asc I 120 I 

XML web page [15J with the workflow description of the + + + + + 

tasks. In our simulation example the requesting client p^^^jj ^j^^ ^^^^^ ^^^^^^ ^^^^^.^^ ^^^^^ 
receives the detailed description below as response: ^^^^^^ f^j^^ fjl^^ ^^^^^^^ ^^^^^ 

^^^''l; E^^''"?,^,"'"/''^?'-''""'' the parser script, thus their length and MDSsum are 

<Job No= 1 TYpe= Download > r r ' o 

<paraineter serverDir="modeis" Fiie="inputdeck_ooi . dat"/> unknown before. Thev are uploadcd to the server, col- 

</Job> ^ 

umn ServerDir creates separate result directories m the 

<Job No="2" TYpe="JobAnsys" Timeout="1000"> f.-. 

<paraineter input="inputdeck_ooi . dat"/> server tile system: 

</ Job> 

select * from Upload; 

<Job No="3" Type="JobParseAnsysEigenfreq " Timeout="1000"> + + + ^ + ^ + 

<Parameter Modef ile="eisenmode . asc" Freqfile="eigenfreq . asc"/> ' ' I 'I ™^ I Last 

</jol-,> o -1 I ID I ServerDir I File I Length I Sum I Mod. I 

H H H + + H + 

<Job No="4" Type="Upload"> I I results/sim_001 I eigenmode.asc I II 

<Paraineter ServerDir="results /sim 001" File="eigenmode . asc"/> ' ' results/simJOl I eigenf req. asc I II 

</Job> ~ I 13 I results/sim_002 I eigenmode.asc I II 

14 I results/sira_002 j eigenf req . asc I II 

<Job No="5" Type="Upload"> ^ ^ + + + + 

<Parameter ServerDir="results /sim_001" File="eigenfreq . asc"/> 

</job> In the previous tables each line describes a single job for 

a client. One ore more jobs can be combined to tasks with 
The engine stores parameters for each type of job in unique task IDs. Tasks are the basic workflow elements 
a separate table, in our example Download, Upload, of our engine. Clients either being on hold or active on 
JobAnsys, and JobParseAnsysEigenfreq. Client appli- the jobs of a single task. The next table shows an excerpt 
cations can be supplied with an arbitrary number of of the Tasks' table structure. The column Job specifies 
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the names of the tables containing the job parameters 
while column Job id indicates the corresponding line 
numbers. The table contains two similar tasks, each with 
five jobs for clients from the cluster group. The first 
task (TaskiD=l) consists of a download job, an Ansys 
simulation, a call of the script to parse the result data, 
and two upload jobs: 



select TaskID, Job, JoblD, Client, ClientGroup, Status from Tasks; 



Task 

ID 


Job 


Job 
ID 


Cli 

ent 


Client 
Group 


Status 


1 


Download 


11 




cluster 


waiting 


1 


JobAnsys 


11 




cluster 


waiting 


1 


JobParseAnsysEigenf req 


11 




cluster 


waiting 


1 


Upload 


11 




cluster 


waiting 


1 


Upload 


12 




cluster 


waiting 


2 


Download 


12 




cluster 


waiting 


2 


JobAnsys 


12 




cluster 


waiting 


2 


JobParseAnsysEigenf req 


12 




cluster 


waiting 


2 


Upload 


13 




cluster 


waiting 


2 


Upload 


14 




cluster 


waiting 



A task can be assigned to a specific client or to 
a client group, here cluster. In the mapping table 
ClientGroups each client can be associated with several 
groups, here some nodes of the cluster group are shown: 

select * from ClientGroups; 



ClientGroup 


Client 


cluster 


node~0 1 


cluster 


node-02 


cluster 


node-03 


cluster 


node-04 


cluster 





Tasks are assigned to the clients sorted by lowest avail- 
able ID and status waiting. To block the task for other 
clients the Tasks servlet changes the status to active 
with the following SQL command [13], [14]: 

UPDATE Tasks SET Status= ' active ' , Client= ' $client ' 
WHERE TaskID='$taskID'; 

In SQL statements we use '$'-notation to indicate the 
values of variables. The servlet assembles the SQL state- 
ments in string buffers replacing variables by their val- 
ues, here it substitutes the logical name of the requesting 
client and the recent task ID. Further SQL statements 
process all jobs of this task by subsequent queries to the 
tables Download, JobAnsys, JobParseAnsysEigenf req 
etc., i.e. they select the lines from the column JobiD and 
generate the XML document with workflow instructions 
for the client. 

When a client is initially started, the server's IP ad- 
dress, a imique logical name for the client (e.g. node-Ol), 
a local directory (e.g. /tmp/node-01/), and optionally a 
client group can be specified. Downloads, uploads, and 
all applications only act on the data in this client direc- 
tory. The client in our example creates a working direc- 
tory ciientDir="ansys_OOl" for the Ansys task within 
/tmp/node-01/. The optional parameter Serveroir is 
used to relocate files in the directory structure during 
the up- and downloads, e.g. the input decks come from a 
special model library directory on the server. The Ansys 
results from different clients are also collected on the 
server in a joint directory results/. Clients and server 
automatically create all directories from the workflow 
descriptions. 



Several clients with different client directories can run 
on the same machine without interfering. Independent 
clients with distinct logical names provide a convenient 
way to exploit the computing power of multicore and 
multiprocessor systems. 

In our example the models/ directory on the server 
comprises a set of input decks. The file system is exposed 
to the clients by the web server via HTTP. Clients can 
download files requesting an URL consisting of the 
server's IP address, the web server subdirectory, and the 
name of the file. The path information is URL-encoded 
to deal with potential whitespaces in directory or file 
names. For file uploads the clients send HTTP-POST 
forms and SOAP-with-attachment requests [17] to the 
server. On the server side incoming multipart /form- 
data is handled by a particular upload servlet and the 
web service extension Axis2 [18]. These services directly 
write uploaded files to the file system of the web server 
making the data immediately available for other clients. 

The input decks in the server's models/ directory 
are part of different Ansys simulation rims. The first 
requesting client downloads inputdeck_OOl.dat to 
its local working directory /tmp/node-01/ansys_001/, 
performs an Ansys simulation, and starts a script 
that parses the results to the files eigenmode . asc 
and eigenf req. asc in the same directory. Table 
TasksWorkingDir allows to allocate each task to a dif- 
ferent working directory on the clients: 

select * from TasksWorkingDir; 

H h h + 

TaskID I Dir I EraseOnExit 

H h + + 

1 I ansysJOl I false 

2 I ansys_002 | false 

Once the task is finished, the flag EraseOnExit au- 
tomatically deletes the subdirectory. Subsequent tasks 
can operate on the result data cached in the working 
directory, if set to false. 

2.2 Job wrappers for local applications 

Basic jobs for downloading and uploading data are 
integrants of the grid client. A workflow engine must 
be capable of wrapping further applications running on 
the clients. In our framework a generic class Job is ex- 
tended to wrap new applications. The class name of the 
extended job wrapper corresponds to the job name in the 
Tasks table. The grid client automatically starts the job 
wrapper and supplies it with the job parameters of the 
XML description. The extended wrapper class employs 
these parameters and assembles a valid script calling 
the application. The script is temporarily generated in 
an internal string buffer. A method of the generic job 
class replaces all aliases (see below) and writes it to the 
working directory of the client. The grid client uses Java 
Native Interface (JNI) to start the script and monitor the 
activity of the application [19]. In our example, the string 
buffer created by JobAnsys looks like this: 

a I /bin/bash 

cd /tmp/node-01/ansys_001/ 
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export LM_LICENSE_FILE=ANSYS_LICENSE 
ANSYS -i inputdeck_001.dat 

The shell header and the change-the-directory command 
in line 1 and 2 are selected by the generic job class 
according to the operating system of the client. Fur- 
ther lines are created by the extended wrapper class, 
ANSYS_LiCENSE here defines a so-called alias that can be 
understood as an internal environment variable. Before 
the script is written to the working directory the generic 
job wrapper contacts an Aliases servlet on the server 
and requests a list of known aliases for the given client. It 
then replaces all aliases by machine-specific parameters. 
The highst priority have aliases directly attached to a 
client's name, e.g. java on client node-Oi is mapped 
to /opt/ jdkl . 6 . 0_12/bin/ java. Then the aliases for 
entire groups of clients are exchanged. Finally, global 
aliases not assigned to a client or client group (e.g. java 
is mapped to java) are applied: 

select * from Aliases; 



ServiceName 


Service 


Client 
Group 


Client 


ANSYS 


/opt/ansys/vll/bin/ansysll 


cluster 




ANSYS_LICENSE 


/opt/ ansys/ shrd/ license .lie 


cluster 




JAVA 


/usr/lib/ jdkl . 6 . 0/bin/ java 


linux 




JAVA 


/opt/ jre-6-solaris/bin/ java 


Solaris 




JAVA 


/opt/ jdkl. 6. 0_12 /bin/ java 




node-01 


JAVA 


/opt/ jdkl . 6 . 0_12/bin/ java 




server 


JAVA 


java 






GNUPLOT 


/usr/bin/ gnuplot 


linux 




GNUPLOT 


/opt/gnu/bin/ gnuplot 


Solaris 





The next listing shows the extended job wrapper class 
for Ansys. The grid client creates a job class instance, 
assigns it with the logical client name, the work- 
ing directory dir, and Ansys-specific parameters, and 
starts the run ( ) method in a separate thread. The 
call of replaceAliases ( ) exchanges the aliases ANSYS 
and ANSYS_LiCENSE in the generic wrapper script by 
machine-specific values from the Aliases table. Based 
on the client's name the Aliases servlet returns a 
list of all known aliases as key-value-pairs. Function 
createScript ( ) writes the completed script to a file 
script_ansys in the client working directory. The 
thread waits at executeScript () until the Ansys simu- 
lation has finished: 

public class JobAnsys extends Job 
{ 

string inputdeck; ' r-'arameter for Ansys 

public void setParameters (Element parameter) 
{ 

inputdeck = getAttribute (parameter, "Input"); 

} 

public void run ( ) 

{ 

log (" Starting JobAnsys for " + dir + "/" + inputdeck); 
StringBuffer t = new StringBuf f er ( ) ; 

t. append ("export LM_LICENSE_nLE=ANSYS_UCENSE" + "\n") ; 
t . append ("ANSYS — b — i " + inputdeck + "\n"); 

Script s = new Script (dir, " script_ansy s " ) ; 

s . createScript ( getShellHeader ( ) 

+ replaceAliases (t . toString ) ); 
s . executeScript ( ) ; 

if (! (new File (dir, "mode4 . png" ) ) .exists () ) 
{ 

log("— Job failed — ") ; 

setStatus ( Job . J0B_ANSYS_FA1LED) ; 

} 



flag_done = true; // Signal to the main thread 

) 

} 

At the same time the main thread monitors process 
load and disk space of the client in a loop. A boolean 
flag_done notifies the main thread that the Ansys 
job is finished. In our example the Ansys input decks 
contain additional instructions to create images of the 
simulated finite element models. The wrapper uses im- 
age mode4 .png to ascertain whether the simulation was 
successfully finished and data for further processing is 
available. 

The job wrapper automatically redirects standard out- 
put and error streams to a log file in the working 
directory. With some applications running separately in 
a background thread, e.g. PamCrash impact simulations, 
only the content of the log file can be analysed to 
monitor the progress and determine when the job will 
be finished. The job wrapper passes the final status of 
the job execution by an integer number to the grid 
client, indicating whether the simulation run has been 
successfully or not. After all jobs of a task are completed, 
the grid client contacts the servlet TaskCompleted on 
the central host and sends task ID, status information, 
and additional information (total runtime of the task, the 
recent process load and free disk space) to the server. 

The status information allows the engine to decide 
whether to redistribute the same task, e.g. to a different 
client, or to tick it off and unlock further tasks depending 
on it (see section 2.5, Task chains). If a client issues a 
request for new tasks without having completed the last 
active task by calling TaskCompleted, e.g. because the 
process was interrupted or the client was switched off, 
the status of the former task is again set back to waiting. 
The Tasks servlet implements this by an SQL update: 

UPDATE Tasks SET Status= ' waiting ' 
WHERE Client= ' $client ' JUTO Status= ' active ' ; 

The interrupted client then resumes its work and restarts 
the task. 

2.3 Parameter template files 

Sometimes command line arguments are not sufficient 
and special parameter files are needed to control an 
application. Before the application is started parameter 
files can be downloaded from the central server or a 
particular job can directly create them on the client. For 
the download case we have developed a generic job that 
substitutes embedded tags in a parameter template file 
by numerical values from a list. Parameter template files 
are useful in design-of-experiment studies, numerical 
optimisations, or analysis of the robustness of a solution, 
where many different sets of parameters have to be 
tested. 

Therefore the pre-defined template file is deployed on 
the server and the job ReplaceTags generates a valid 
parameter file for a simulation run from the downloaded 
template. This job parses the template and looks for 
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placeholders in XML-element form. The names of the el- 
ements can be arbitrarily chosen to avoid collisions with 
ke5rwords of the parameter file. The mandatory attribute 
ID serves as a key to replace the entire tag by a numerical 
value from the parameter list of the Replaceiag job. To 
handle fixed-format files with strict column widths an 
optional attribute Len specifies the number of characters, 
the numerical values have to be accordingly rounded 
down or extended with whitespaces. The XML tags 
can be extended by further attributes, i.e. we have 
determined minimum and maximum bounds (or mean 
value and deviation) for each parameter. This enables 
ReplaceTag to substitute the placeholders by random 
values from the defined range (or values from a nor- 
mal distribution around the mean value). This feature 
allows the user to select and alter arbitrary parameters 
in design-of-experiment studies on the fly without any 
changes to the workflow. 

The next listing provides an excerpt of an input deck 
template for Ansys [20] that we use in tests with finite 
element simulations. The first line defines a square solid 
plate with edge length a, the next three lines construct 
solid cylinders at fixed positions inside the plate. In the 
input deck we substitute the radii of the cylinders by 
tags with identifiers rl, r2, and r3. The last line drills 
holes into the structure subtracting the cylinder shapes 
from the plate (cf. Fig. 2): 

RECTNG, 0, a, 0, a 

CYL4, 32e-3, 32e-3, <TAG ID="rl" Min="le-3" Max="7e— 3" Len="5"/> 

CYL4,28e-3, 9e-3,<TAG ID="r2" Min="le-3" Max="7e— 3" Len="5"/> 

CYL4, lOe-3, 30e-3, <TAG ID="r3" Min="le-3" Max="7e— 3" Len="5"/> 

ASBA, 1, ALL 

With this template we explore the behaviour of the plate 
for holes with random diameters in the feasible range 
from Min=l to Max=7 mm (see section 3.1 for more 
details). In a second study the radii are additionally 
subject to an optimisation procedure that attempts to 
match the simulated natural frequencies of the plate 
with a given spectrum (see section 3.2). To keep track of 
the parameters actually used in the studies ReplaceTag 
creates a log file for each simulation run. The log file 
and the resulting frequencies from the Ansys simulation 
are uploaded to a result directory. A client on the server 
initiates a job to add the new information to a database 
table maintaining all parameter and result values of the 
studies. 

2.4 Joint services across mixed operating systems 

The workflow engine can merge identical applications 
and functionalities from loosely coupled heterogeneous 
clients as abstract services. The engine provides a trans- 
parent access of services as resources in the grid- 
independent of the underlying client's operating sys- 
tems. In general the aliases mechanism is not sufficient as 
not only the parameters but the scripts themselves differ 
on different platforms. To simplify the maintenance of 
the system we have put all different scripts for the 
utilised operating systems in the same job wrapper class 



so that the wrapper can switch between them. The 
workflow engine maintains a mapping table, specifying 
an operating system for each client name. 

The complete Ansys wrapper contains two script tem- 
plates for Windows and Linux. These templates only dif- 
fer in an export versus a set command while utilising 
identical aliases for the Ansys binary and the license 
server. In the next step we have introduced a new 
client group ansys selecting all machines able to run 
Ansys. Tasks assigned to this group are automatically 
distributed to all requesting grid clients independent of 
their operating system. 

Further client dependent parameters of services can 
again be handled with aliases, an alias java_OPTIONS 
e.g. specifies the maximum available heap size for a Java 
process to -Xmx4 00 0M on a particular machine. If we 
subsume all machines with a large amount of memory in 
a new group ansys-large, while they are still members 
of the normal ansys group, they participate in every 
Ansys simulation whereas extensive simulation runs 
with bulky input decks can be exclusively allocated to 
the new client group. 

The combination of client groups, aliases lists, and job 
wrappers with operating system switches allows the use 
of abstract services in the workflow description table 
Tasks whereas the actual implementations of services 
differ on the clients. 

2.5 Tasl< chains 

Complex workflows can be constructed with task chains 
realised by a special column DependsOnTask in the tasks 
table. The column maintains the task ID of a predecessor 
task. In the following example tasks 1 and 3 can be 
immediately executed by clients from the cluster group 
while tasks 2 and 4 for a client running on the server de- 
pend on the results of the preceding Ansys simulations: 

select TaskID, Job, JobID, DependsOnTask, Client, ClientGroup 
from Tasks; 



Task 




Job 


Dep' s 


Client 


ID 


Job 


ID 


OnTaskI Client 


Group 


1 


Download 


40 




cluster 


1 


JobAnsys 


21 




cluster 


1 


JobParseAnsysEigenf req 


21 




cluster 


1 


Upload 


80 




cluster 


1 


Upload 


81 




cluster 


2 


Download 


41 


1 1 server 




2 


Insert I ntoDat abase 


11 


1 1 server 




3 


Download 


42 




cluster 


3 


JobAnsys 


22 




cluster 


3 


JobParseAnsysEigenf req 


22 




cluster 


3 


Upload 


82 




cluster 


3 


Upload 


83 




cluster 


4 


Download 


43 


3 1 server 




4 


Insert I ntoDat abase 


12 


3 1 server 





One action of the servlet TaskCompleted is to set the 
status of a completed task to done and to remove 
the respective task ID from all further entries in the 

DependsOnTask column: 

UPDATE Tasks SET Status* ' done ' WHERE TaskID= ' $laskID ' ; 

UPDATE Tasks SET DependsOnTask=NULL WHERE DependsOnTask= ' $taskID ' ; 

This procedure is repeated for each task completed and 
activates all tasks depending on the one just solved. 
Servlet Tasks then assigns the first available task to a 
requesting client by the following scheme: 
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CREATE TEMPORARY TABLE TasksTemp SELECT * from Tasks 

WHERE Status* 'waiting ' AND DependsOnTask IS NULL ORDER BY TaskID; 

CREATE TEMPORARY TABLE TasksTemp2 SELECT * from TasksTemp 

WHERE Client IS NULL ORDER BY TaskID; 

SELECT * FROM TasksTemp WHERE Client= ' $client ' ; 

SELECT * FROM TasksTemp2 T, ClientGroups C 

WHERE T . ClientGroup=C . ClientGroup AND C . Client= ' Sclient ' ; 

SELECT * FROM TasksTemp2 WHERE ClientGroup IS NULL; 

The first two statements create temporary tables only for 
tasks whose status is waiting and if DependsOnTask is 
not set. This prevents tasks reliant on predecessors tasks 
from being distributed and speeds up all subsequent 
selections. The first select statement looks for tasks 
directly assigned to the logical name of a requesting 
client. The second statement selects tasks for all groups 
the requesting client is member of. The last statement 
only acts on tasks neither assigned to a particular client 
nor to a client group. The tasks are ordered by their IDs, 
the servlet returns the task with the lowest ID to the 
client. This sequence of select queries prioritises tasks 
bound to a specific client against tasks for an entire client 
group. It prioritises workflows waiting for a special 
service available only on a single client thus avoiding 
bottlenecks as such a workflow does not have to wait 
until all further tasks for the client group are completed, 
e.g. typically extensive simulation runs intended for a 
cluster group. 

The initial status of the task chains is passive to 
prevent clients from working on incomplete chains, 
where premature calls to TaskCompleted cannot free 
successor tasks that have not yet been inserted. After 
the last task of a chain has been inserted the chain is 
enabled by switching the status to waiting. The SQL 
commands are executed during every client request. 
To ensure database performance for huge numbers of 
tasks we have created indexes on the columns TaskiD, 
DependsOnTask, Client, ClientGroup and Status. 

2.6 Client monitoring 

As process load and available disc space cannot directly 
be monitored by a Java virtual machine, we have in- 
tegrated a monitor to apply native functions such as 
uptime and df on Linux based systems. The clients 
automatically supply the server with this information 
during queries for new tasks: 

http ://... /Tasks ?Cllent=node-01SLoad=l . 54SDisk=73360476 

If no job is available the client sleeps for a certain interval 
before repeating its request. Sleep time intervals can be 
independently specified in the table ClientsAvailable. 
The load and disk space information are also monitored 
here: 



select Client, IP, TaskID, LastRequest, Sleeptime, 

Load, Dlskspace from ClientsAvailable; 
H H + + + + H + 



Client 
+ H 


IP 


Task 
ID 


LastRequest 




Sleep 
time 


Load 


Disk 1 
space 1 


node-01 




-1 


06-23 11:23 


08 


300 


I . 54 


+ 

73360476 I 


node-02 




-1 


06-23 11:25 


01 


300 


I .79 


71360000 1 


1 node- 03 




-1 


06-23 11:25 


21 


300 


. 69 


69359261 1 


1 node- 04 




-1 


06-23 11:25 


23 


300 


I . 72 


7I55992I 1 


1 server 




-1 


06-23 11:25 


46 


30 


. 32 


86659732 I 



H H + H + + + + 



In addition, each client's request automatically updates 
the column LastRequest. The columns LastRequest 
and Sleeptime allow to estimate when the client will 
again contact the server. Clients exceeding this time 
interval may be down or cannot connect with the server. 

With Secure Shell (SSH) [21] active clients can start fur- 
ther clients on remote machines. Via SSH access the client 
software itself can also be distributed, e.g. to all nodes 
of a cluster. Table Clients maintains the information 
needed: user@client, password or key file, location of the 
installation directory the user is allowed to write in, and 
the group of clients able to access the remote system. 
We integrated a job startGridClient that utilises the 
pure Java SSH implementation JSch [22] to copy the 
local installation to remote machines and execute it 
there. To apply this method to all clients of a cluster, 
e.g. after a reboot, we can insert StartGridClient jobs 
for all clients currently not available by querying the 
tables Clients and ClientsAvailable. Job submission 
is discussed later in section 3.2. After an initial client 
has been started manually it automatically installs and 
starts further clients. All clients started like this are 
active in the process with the number of active grid 
clients doubling at each stage. We so managed to install 
and start clients on a cluster with 100 nodes in less 
than two minutes. This mechanism is also especially 
useful to redistribute the entire client system after major 
modifications. 

2.7 Dynamic class loading 

All job wrappers for client applications are maintained 
in a special package directory on the client. Extensions to 
or modifications of these wrappers are frequent practise. 
To avoid a complete redistribution of the system after 
wrapper specific modifications we have additionally in- 
troduced a directory on the web server for compiled job 
wrapper classes. The clients utilise dynamic class loading 
[23] for inherited classes of Job except the very basic 
ones. 

If a class for a specific job name from a workflow 
description, e.g. JobAnsys, cannot be found in the local 
repository of the client's JAR file the client attempts 
to download and instantiate the wrapper class from 
the central web server directory. Thus new application 
wrappers and modifications are immediately visible to 
all clients within the grid system and a redistribution is 
possible without having to restart. 

2.8 Administrative rules 

The assingment of clients to client groups can be dy- 
namically changed in the central database according to 
administrative rules. One rule states that e.g. next week 
a subset of cluster nodes is booked for benchmarks and 
cannot be accessed during the day. Beside the existing 
group cluster for all nodes, for the subset we have 
introduced a new group reserved. We then establish 
cron jobs [24], [25] on the server to have all reserved 
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clients from the cluster group removed in the morning 
and re-assigned in the evening: 

# Cron job at 7 am: 
♦ 

CREATE TEMPORARY TABLE ReservedNodes SELECT Client FROM ClientGroups 

WHERE ClientGroup="reserved"; 

UPDATE ClientGroups SET ClientGroups" clusler_suspended" 

WHERE ClientGroup-"cluster " AND 

Client IN (SELECT Client FROM ReservedNodes); 

UPDATE ClientsAvailable SET SleeptimeMin-3 600 , Sleept iraeMax-3 600 

WHERE Client IN (SELECT Client FROM ReservedNodes); 

# Cron job at 7 pm: 
♦ 

UPDATE ClientsAvailable SET Sleept imeMin=60 , SleeptimeMax=300 

WHERE Client IN (SELECT Client FROM ClientGroups 

WHERE ClientGroup="clusler_suspended") ; 
UPDATE ClientGroups SET ClientGroups" clu ster " 

WHERE CI lent Group ="cluster_suspended"; 

Jobs can now be scheduled to the same cluster as before, 
at night the cluster is automatically extended by the 
nodes reserved for benchmarks. This set-up is especially 
useful if runtime and end of a jobs can be estimated be- 
forehand as e.g. in the case of many identical simulations 
such as parameter studies or optimisation runs. 

The first script also prolongs the request interval for 
the reserved nodes up to 1 hour at daytime. This further 
reduces the activity on the benchmark nodes. It is possi- 
ble to provide different minimum and maximum values 
for the sleep time interval, e.g. 1 to 5 minutes at night. 
The server randomly picks a value out of this interval. By 
these random time offsets we avoid long term synchro- 
nisations, if many grid clients simultaneously request a 
new task. 

We applied long request intervals between 120 and 
10,000 seconds (« 3h) in tests with clients on 85 differ- 
ent cluster nodes resulting in statistically one node per 
minute requesting a new job. If enough jobs are available 
an increasing number of nodes gets involved. The client 
system implicitly realises a load balancing with Round- 
Robin scheduling [26], [27] and automatically utilises 
more nodes for long running processes. After approx- 
imately 90 minutes the entire cluster is involved. 

The same technique is applied to distribute tasks to 
idle cluster nodes. With every request the clients update 
their current load levels in the ClientsAvailable table. 
A cron job periodically selects clients with load levels 
above individual thresholds, removes them from their 
regular client groups and puts them on hold as long as 
their load level does not change. This does not effect 
tasks allocated to a particular client name, these are 
further executed, even if the load level of a specific client 
is high. 



3 Applications 

In this section a first implementation of the previ- 
ously described workflow engine is used in design-of- 
experiment studies, parameter optimisations, and ro- 
bustness validations of finite element structures. We 
propose extensions for workflow submission and stateful 
services and discuss interfaces to integrate algorithms 
and external libraries to the modular system. 




Fig. 2. Two designs of tine 60x60 mm plate as finite 
element models. The radius is limited to a maximum 
of 7 mm so that the holes will not protude the plate's 
edges (left), a minimum 1 mm was introduced because 
the meshing generator cannot process vanishing holes 
(right). In experimental design and optimisation studies 
the hole radii vary independently. 



3.1 Design-of-experiment studies 

The first workflow deals with parameter variations and 
can be used in design-of-experiment (DOE) studies [28]. 
It includes pre- and post-processing jobs to maintain 
all results in database tables. To save simulation time 
we employ a toy model from mechanical engineering 
for the tests. The finite element code Ansys [20] is 
used to simulate the natural frequencies of a square 
plate containing holes of variable diameter. Generally, 
vibrational analysis is applied in structural investiga- 
tions to meet the natural frequencies of a system with 
certain requirements. In automotive development the 
natural frequencies of the vehicle body are not allowed 
to overlap with excitations by aggregates or road-wheel 
interactions. In the development of vibration sensors the 
natural frequencies must be robust against parameter 
inaccuracies, due to e.g. inhomogenities of the material 
or temperature fluctuations, and the exact shape of the 
eigenmodes is essential to develop reliable electrical 
transducers. 

In the first study we explore the overall behaviour 
of the solid plate with three holes at fixed positions 
and hole radii between 1 and 7 mm. Figure 2 shows 
finite element models of the plate with maximum and 
minimum hole diameters. We define tags for the radii 
in the input deck as described in section 2.3 and create 
300 instances of an Ansys workflow. A ReplaceTag job 
exchanges the tags for randomly distributed values from 
the given interval. Figure 3 shows the result of 300 simu- 
lation runs: the natural frequencies of the lowest modes 
of the vibrating plate. Figure 4 depicts the parameters of 
the hole radii, chosen randomly by ReplaceTag, as grey 
dots. Since these parameter sets cover the entire design 
space, they can be used to construct a Response Surface 
Model (RSM) of the plate system to predict new results 
without further simulations [29]. In section 3.3 we apply 
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Fig. 3. Natural frequencies of tine plate from Fig. 2 
with holes of varying radii at fixed positions (material 
density 7.83 g/cm^, Young's modulus 2.1-10''' N/mm^, 
Poisson's ratio 1/3). Ordinate values are extended to 
enhance visibility The unsupported plate was simulated 
in two dimensions, the first three modes correspond to 
rigid body motions with Hz: two translational and one 
rotational degrees of freedom. 



a similar approach with direct interpolation on the data. 

Alternative approaches to fill the parameter space are 
Factorial Designs [30] and Latin Hypercube Designs [31], 
which avoids redundant parameter sets and constructs 
a RSM with similar accuracy in less simulation runs. In 
such DOEs typically all parameter sets are defined before 
the first simulation is run. As shown in section 2.3, the 
ReplaceTag can also handle this DOE with parameter 
values deriving from a table of the workflow system. 

The scatterplots in Figure 5 show the resulting fre- 
quency combinations for the natural modes of the plate. 
Even though the parameter space is uniformly sampled 
with random points, the calculated spectrum in the 
frequency space is limited to certain band structures. 
The transformation from parameters r to frequencies f (r) 
involves the full numerics of a finite element simulation. 
In practise this function is not invertible, i.e. no simple 
rule can be applied to find parameter sets for a given 
frequency set. 

3.2 Parameter optimisation 

We introduce a new package for workflow submission 
to extend the system by optimisation libraries. The chal- 
lenge was to find a special design of the plate, i.e. a 
certain set of hole radii to produce a given eigenfre- 
quency spectrum. Not all combinations of natural fre- 
quencies can simultaneously be realised as demonstrated 
in Figure 5. We opt for a feasible set of frequencies 
for the lowest three modes of /| = 28, /g = 30, and 
/g = 33 kHz exactly. Next we look for a parameter set 
r = (ri, r2, ra) to minimise the objective function defined 
as the square distance from the actual frequencies f to 
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Fig. 4. Scatterplots of parameters: For the first study we 
pick 300 sets of independent hole radii (ri,r2,7'3) = r 
from the interval 1...7 mm, grey dots. Figure 5 shows 
the resulting natural frequencies of the plate. The re- 
sults are strongly correlated since each hole radius in- 
fluences all frequencies. Black dots are part of a sec- 
ond study, where an optimisation algorithm fits the radii 
in order to produce a given frequency spectrum, i.e. 
f * = (28, 30, 33) kHz for the lowest modes. A plate with 
r* = (3.521,3.755,5.937) mm meets the requirements. 
Robustness against parameter inaccuracies of this design 
is checked in further data explorations in section 3.3. 



the given frequency set f * 



o(r) = |f (r) - f* 



(1) 



A generic optimisation algorithm proposes an initial 
parameter set r and expects the value of the objective 
function o(r) back. Based on the result the algorithm 
suggests further parameter sets presumably closer to the 
objective's function minimum. To integrate optimisation 
libraries in the workflow system we have developed a 
package to submit workflows via HTTP including job 
parameters and table descriptions from external applica- 
tions. The syntax of the workflow description is similar 
to the XML response of the Tasks servlet, but without 
the concrete numbering for tasks and jobs. 

The following script shows an excerpt of a such a 
workflow description used in DOE studies and optimi- 
sation runs: a template inputdeck.dat is downloaded 
from the server directory models/, the tags rl, r2, and 
r3 of the input deck are then replaced by the numer- 
ical values. After completion of the input deck Ansys 
is started. JobParseAnsysEigenf req parses the natural 
frequencies from Ansys' output. An upload job sends 
the data to the server directory results/sim_001/: 

<Task Client Dir="ansys_001" startDependsOnClient Group =" cluster " 
startTaskChain ="true" 
startDependsOnGroupNode =" true"> 

<Job Type="Download"> 
<Download Dir="models" File="inputdeck . dat" /> 

</ Job> 

<Job Type="JobReplaceTag">; 
<Coluran Name="Input" TYpe="VARCHAR" Value="inputdeck . dat" /> 
<Coluran Name="Output" TYpe="VARCHAR" Value="inputdeck_mod . dat"/> 
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Fig. 5. Scatterplots of the natural frequencies in kHz 
for all pairs of modes: Grey dots indicate the results 
of the random parameter study to explore the solution 
space (cf. Fig. 4). The modes are strongly correlated as 
smaller holes increase all frequencies. A mode describes 
a certain shape of vibration. The eigenmode solver sorts 
all mode shapes by frequency so the numbering of modes 
with similar frequencies interchanges prevalently. These 
permutations cause broad scatter bands with accumula- 
tions on the edges (see e.g. mode pair 6 and 7). The black 
dots belong to the optimisation study, where the first three 
natural modes have to meet the specified frequencies 
(red crosses). 



<Column Name="rl" 
<Column Narae="r3" 
<Column Name="r3" 
</ Job> 
</Task> 



TYpe="DOUBLE" 
TYpe="DOUBLE' 
TYpe='T)OUBLE' 



Value="2. 83535" 
Value="3. 64375" 
Value=" 6.21132" 



/> 
/> 
/> 



<Task ClientDir="ansys_001"> 

<Job Type="JobAnsys"> 
<Columii Name = "Input" Type = "VARCHAK' 

</ Job> 
</Task> 



Value="inputdeck_mod . dat"/> 



<Task ClientDir="ansys_001"> 
< Job Type="JobParseAnsysEigenf req "> 
<Coluinn Naine="Freqfile" Type="VARCHAR" Value="eigenfreq . asc"/> 
<Column Name="Modefile " Type="VARCHAR" Value="eigenmode. asc"/> 

</ Job> 

<Job Type="Upload"> 



<Upload Dir=" results /sim_001" File="eigenfreq . asc"/> 

</ Job> 
</Task> 

<Task ClientDir="temp" startDependsOnClient="server"> 
<Job Type="Download"> 
<Download Dir-"results /sim_001" File="eigenfreq . asc"/> 

</ Job> 

< Job Type="JobInsertAnsysResultIntoDatabase">; 
<Column Name="Tablename" Type="VARCHARf' Value="AnsysResults" /> 
<Column Name="Datafile " TYpe="VARCHAiy' Value="eigenfreq . asc"/> 
<Coluran Name="SimID" Type = "INTEGER" Value="001" /> 

</ Job> 
</Task> 

The last task is assigned to a grid client on the server 
with access to the local database via JDBC [14], con- 
fer Fig. 1. Like every other client it first downloads 
the results to a local working directory temp/ before 
a second job inserts them to the table AnsysResults. 
The directory extension "_0 0l" is incremented in each 
simulation run, in the result table it acts as a unique ID. 

The workflow submission package provides a generic 
way to assemble the XML description and convey it to 
the server. A workflow submission client is designed to 
work within the loop of an optimisation algorithm. It 
is triggered by the subroutine that usually evaluates the 
objective function. During the first call JNl Invocation 
[19] initialises a persistent instance of a virtual machine 
within the C/C++ or Fortran subroutine to hold the Java 
submission client. Parameter guesses from the optimisa- 
tion library are transparently passed to the client without 
restarting the virtual machine at each call. The client 
program inserts the guesses to the XML template and 
sends the description to a workflow submission servlet. 
The client waits until the workflow is processed and the 
value of the objective function is available on the web 
server as a result file. It returns the value to the calling 
function of the optimisation library. 

Insertion of tasks and jobs in the engine ta- 
bles and linkage of the dependencies are estab- 
lished by the submission servlet controlled by at- 
tributes in the XML workflow description. The attributes 
start /closeTaskChain and start/closeTaskGroup 
mark the beginning and end of task chains and groups. 
The attribute startTaskChain establishes a linear chain: 
every task except the very first is blocked by its direct 
predecessor (cf. section 2.5). 

The flag startOependsOnGroupNode indicates that all 
tasks of a chain have to be executed on the same node of 
a client group. It allows to distribute task chains to client 
groups where the client receiving the first task will also 
process all consecutive tasks of the chain. This allows 
more flexiblity than assigning all pending jobs to one 
single task, since client monitoring and job repetition can 
only be applied on a task-wide scale. The flag is stored in 
the column DependsOnGroupNode of the Tasks table, and 
the servlet TaskCompleted updates the dependencies 
after a task from a chain has been completed: 

UPDATE Tasks SET Cllent= ' $client ' , ClientGroup=NULL 
WHERE DependsOnTask= '$taskID ' 3UID DependsOnGroupNode =' true ' ; 

Attribute Type annotates the types of data for the 
database table corresponding to the job's name. If not 
in place, the workflow engine automatically constructs 
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the table JobReplaceiag with columns rl, r2, r3 of type 
DOUBLE for the parameter values and input. Output of 
type VARCHAR for the names of the input or output files 
respectively. 

The attribute startTaskGroup is used to split the 
workflow and create parallel branches. Each branch 
of a task group constitutes an independent chain, the 
initial tasks of all chains are coupled to the same ID 
of a preceding task. This preceding task unblocks all 
branches simultaneously. Conversely closeTaskGroup 
acts as a thread barrier waiting until all tasks of a 
group are completed. It employes the column TaskGroup 
of the Tasks table and a dummy task as special job 
MonitorTaskGroup monitoring the status of all tasks 
with a certain group ID. In the following example two 
independent branches (tasks 378 and 384) are already 
processed by nodes 04 and 02. Nodes 01 - 04 are now 
active on further tasks (cf. dotted lines): 



.376. 
377 
378 
379 

. 380 . 
381 

, 382 . 
383 
384 
385 

. 386 . 
387 
388 
389 



Job 



. Simulatii 

Upload 

Simulatii 

Upload 
.Simulatii 

Upload 
.Simulatii 

Upload 

Simulatii 

Upload 



Upload 

:MonitorTaskGr: 

CreateReport 



Dep's I I client I DepOn Task 

OnTask | Client | Group | GrNod Group 

I .node-03. | .cluster. I 375 

376 I I cluster | true 375 

node-04 | cluster | 375 
node-04 | cluster | true 375 

I .node-01. | .cluster. | 375 

380 I I cluster | true 375 

I .node-0 4 . | .cluster. | 375 

382 I I cluster | true 375 

node-02 | cluster | 375 
node-02 | cluster | true 375 

I .node-02 . | .cluster. | 375 

386 I I cluster | true 375 

I cluster I 375 
388 I I cluster | true 375 



Statu 
.activi 



The monitor is implemented in the Tasks servlet. If the 
number of tasks with group ID 375 and status unequal 
to done turns zero, the servlet sets the monitor task to 
done and unblocks dependent tasks, here task 391 that 
generates a report of all performed simulation rims. 

Using the workflow submission package we have 
interfaced the engine with the Adaptive Simulated An- 
nealing optimisation library [32], [33]. Simulated An- 
nealing attempts to find the global minimum of the 
objective function in a stochastic process [34]. In Fig- 
ures 4 and 5 the black dots indicate the progress of the 
optimisation procedure. The final parameter set r* = 
(3.521,3.755,5.937) mm yields a plate with the exact 
frequency spectrum demanded. In the next section we 
analyse how robust the solution will be if the parameters 
differ from this optimal setting. 

3.3 Robustness of solutions 

The data from the previous sections is employed in 
a visualisation workflow analysing the robustness of 
a design against parameter fluctuations. The workflow 
automatically creates diagrams with the interactive data 
plotting utility Gnuplot [35] and the R package for statis- 
tical computing [36]. The parameter sets from the DOE 
study and the optimisation process are reused to inter- 
polate the frequency response f(r) around the optimum 
value r*. Various simulated points in the immediate 
vicinity allow a detailed prediction for further parameter 
sets without additional numerical simulations. 



A service supplied with a set of measured points and 
a list of parameters with unknown values executes the 
interpolation. The service tessellates the measured points 
using QhuU [37], [38], in the 3-dimensional parameter 
space, i.e. it constructs tetrahedrons between every four 
points with no further point inside. The values of ar- 
bitrary parameter points are then interpolated by the 
corner values of the tetrahedron containing the new 
point. We have employed the interpolated frequency 
response to examine different aspects of the robustness 
of the optimal design: 

• To get frequency contour lines in the parameter 
space as depicted in Figure 6 we have evaluated 
the frequency response function on regular grids in 
three perpendicular planes. The contour lines show 
how the value of the objective fimction varies for pa- 
rameter sets next to the optimal point r* (black dot). 
Large distances between contour lines indicate areas 
of robust solutions with the frequency value being 
only slightly affected by parameter inaccuracies. The 
reliability of this prediction depends on the distance 
between an interpolated parameter point r and the 
next simulated point, cf . the grey background scale 
in Fig. 6. Since part of the data stems from the 
optimisation run many simulated points are close 
to r*. 

« We have also interpolated the function on a cloud 
of random points r around the optimum. Figure 7 
shows the resulting frequencies depending on the 
distance d = |r — r*| to the optimal point r*. 
Deviations from the optimal parameter set result in 
enlarged frequency spectra, the solid lines confine 
the spectra and specify worst case frequency values 
for parameter inaccuracies. 

• Capability study techniques treat inevitable inaccu- 
racies in the manufacturing processes by specific 
parameter distributions, e.g. the realisations r are 
assumed to be normal distributed aroimd their op- 
timum value r*. Histograms or box plots of the 
resulting frequency spectrum are used in Six Sigma 
analysis to determine how the manufacturing pro- 
cess meets specification limits [39]. 

For the last case we have generated 10,000 parameter 
sets r = {ri,r2,r3) following a multivariate normal 
probability density with mean r* and standard deviation 
a = 0.01 mm for all D = 3 parameters: 



p(r) 



(27r)^/2a 



exp 



r — r 



(2) 



To deal with independent or correlated deviations (cri, 
o'2/ o's) a more general form of (2) can be applied, see [40]. 
The results are summarised in Figure 8. For the given 
manufacturing precision almost 95% of the plates have 
natural frequencies in the range /Modo4 = [27.973; 28.029], 
/Modc5 = [29.964; 30.039], and /wodce = [32.983; 33.013] 
kHz. 

The diagrams are created by R and Gnuplot jobs in 
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Fig. 6. Robustness of tine solution: Hole radii rl = 3.521, = 3.755 and = 5.937 mm (point in the center) reveal 
exactly the desired mode frequencies. Contour lines indicate the frequency variation of deviations from this optimal 
point, denoted in per mille of the original value, i.e. for mode 5 (30 kHz) the first line "0.1 — " corresponds to 29.997 and 
30.003 kHz. The contour lines derive from the interpolation of the scattered simulation data. Their reliability is heavily 
dependent of the distance to the closest simulation point in the parameter space. The distance from interpolated 
points to the nearest simulated point is depicted as grey scale in the background in [mm]. In white areas a simulated 
parameter set (ri, r2, r^) is always within the close distance of maximal 0.03 mm. 



the workflow chain. We have manually assembled a 
prototype of each diagram and used it as a template for 
wrappers of the visualisation programs. Workflows then 
produce all diagrams supplying the scripts with alternat- 
ing pairs of column numbers and the corresponding axis 
labels. The listing shows a wrapper for Gnuplot creating 
the contour plots from Figure 6. The file si contains 
the Gnuplot controls from the template, script s2 starts 
Gnuplot with these controls: 

public class JotaGnuplot extends Job 
{ 

String inputfile; 
String outputfile; 
String colX; 



public void setParameters {Element parameter) 
{ 

inputfile = getAttritaute (parameter, "Input" ); 
outputfile = getAttritaute (parameter, "Output" ) ; 
colX = getAttritaute (parameter, "Column_X" ) ; 



StringBuffer t = new StringBuf f er ( ) ; 



t 


append("set term post color solid 

append("set out '" + outputfile 


enhanced 8" 




"\n"), 


t 


+ " ' " 


+ 


"\n"). 


t 


append ("set title '" + title 


+ " ' " 


+ 


"\n"). 


t 


append ("set grid" 




+ 


"\n"), 


t 


append ("unset surface" 






"\n"), 


t 


append ("set contour base" 






"\n"), 


t 


append ("set cntrparam levels increm 0.0 , 0.1" 




"\n"), 


t 


append ( " set view , 90" 




+ 


"\n"). 


t 


append ("set xlabel '" + xlabel + 




+ 


"\n"), 


t 


append ("set ylabel '" + ylabel + 




+ 


"\n"), 


t 


append ("set xrange [ " + xl + " : " + 


x2 + "1" 




"\n"), 


t 


append ( " set y range [ " + yl + " : " + 
append (" splot + inputfile + "' 


y2 + "]" 




"\n"), 


t 


using " + 






colX + ":" + colY + ":" + colZ + " 


with lines " 




"\n"). 



String control file = "gnuplot . control " ; 

Script si = new Script (dir, controlf lie) ; 
s 1 . ere ate Script ( t.toStringO ); 



String text = "GNUPLOT " + controlf lie; 
Script 32 = new Script (dir, "start_gnuplot") ; 

s2 . createScript (getShellHeader ( ) + replaceAliases (text) ) ; 
s2 . executeScript ( ) ; 

flag_done = true; 

} 



public void run () 

{ 



Figures 6-8 show the results of the visualisation work- 
flows. In the diagrams only the influence of design 
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Fig. 7. Variation of mode frequency witin parameters 
{ri,r2,r3) = r differing form tineir optimum value r*. 
The Distance from the optimal value is defined by d = 
[(ri-r*)2 + (r2-r2)^ + (r3-r3)2]^^^, red crosses mark 
data points from simulations runs, black dots are interpo- 
lated. Solid lines indicate worst expected values, i.e. the 
small aperture angle indicates that mode 6 is most robust 
against parameter variation. 




Fig. 8. Distributions of natural frequencies if hole radii r 
follow normal distributions around their optimal value r* 
with standard deviation a = 0.01 mm, i.e. approximately 
95% of the samples are within range r*±2cr. 10,000 inter- 
polated simulations were used to create the histograms 
and the boxplots at the bottom, green boxes mark 50% 
quartiles for the results. 



parameter inaccuracies are examined. Other parameters, 
not subject to the design or optimisation procedure, may 
have more impact on the robustness of a solution, e.g. 
the thickness of the plate, homogeneity of the material, 
positions of the holes, or frequency shifts by temper- 
ature. Their influence can be evaluated in additional 
DOE studies applying the same methods and automated 
workflows discussed before. 



3.4 Further aspects 

3.4.1 Stateful services 

The interpolation service from the previous section ap- 
plies a list of simulated parameters with known fre- 
quencies as input. Every call of the service results in 
a new tessellation of all points from the list. This re- 
dundancy can be avoided by a stateful service, where 
the behaviour in consecutive calls depends on the actual 
state of the service. In the case of the interpolation 
service, the tessellated models are stored in the local 
working directory. The MD5 algorithm [16] calculates 
hash keys for the content of the input files. To label the 
tessellated models keys with the length of 32 characters 
are used as file names. In every call the service first 
checks whether a model file for the input data exists. 
When identical input data occur, as many times during 
the generation of the diagrams, the service works with 
the already tessellated models from the repository. An 
implementation of the interpolation as a web service 
further increases the performance, since the model data 
can be held in the main memory of the application 
server. 



3.4.2 Billing 

As the runtime of all processed tasks is logged in the 
table TasksCompleted it can be used to implement a 
billing system for expensive simulation tasks. In the 
database we have selected the runtimes of, e.g., all 
Ansys simulations and grouped them by users. The rim- 
times are multiplied with performance factors from the 
Clients table describing the individual speed of a node. 
If more processes than available processors (or cores) are 
active the runtimes are divided by load factor. The grid 
client monitors the load during the job execution and 
periodically sends it to the server. The time interval is 
customisable, the default is every two minutes. For the 
billing system a cron job logs the varying loads from 
table clientsAvailable for further evaluations. 

We developed standard benchmark jobs to automati- 
cally get presets for the performance factors, e.g. we use 
a program to calculate the Mandelbrot set [41] for dif- 
ferent resolutions and iteration depths. The benchmark 
jobs are executed on the clients at different load levels. 
The rimtimes of the jobs serve as a relative measure for 
the preformance of the clients. 
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3.4.3 Dedicated servers and distributed file systems 

In the basic set-up, the web server for file transfers, the 
servlet container for communication with the chents, and 
also the database engine are active on the same host. 
Especially during the transmission of extensive data 
the web server imposes a bottleneck and slows down 
the concurrent processes. The central file repository can 
be relocated to a dedicated web server to enhance the 
performance. In the default configuration the grid client 
contacts servlet container and web server at the same 
address. 

The concept of a server file system on the engine also 
allows to apply more sophisticated methods for data 
exchange. The system was originally designed to deal 
with clients behind firewalls interacting only via HTTP 
with the central server. If the clients share common 
directories via network file systems, e.g. Samba [42], NFS 
[43], or AFS with caching facilities [44], [45], the central 
file repository can be replaced by such a resource. In 
this scenario download and upload jobs are simplified to 
native methods that copy data between local directories. 
Even now the network file system is responsible to share 
the data between the clients, this change does not effect 
the abstract workflow descriptions. 

Also peer-to-peer file systems, which are highly op- 
timised for data throughput, can be embedded as a 
transport layer, e.g. XtreemFS [46]. They organise data 
in chunks and distribute them to all network nodes. Ac- 
cess to remote data initiates parallel data transfers from 
different hosts. Since the data chunks are replicated on 
several locations, the system is robust against failures of 
single nodes. In each case the workflow engine remains 
responsible for the synchronisation of the data transfers. 
A consecutive download process of a client will not be 
initiated before the upload of the data from an other 
client has been finished. 

4 Conclusion 

The proposed platform independent client-server-system 
quickly turns a large number of heterogeneous comput- 
ers into a flexible computing grid without expensive 
installations or administrator rights. The system is ro- 
bust against unreliable network connections or firewall 
restrictions as the clients periodically contact the server 
engine via stateless HTTP. The engine utilises database 
tables to implement the logic of complex parallel work- 
flows. It is based on a table for tasks and servlets to 
communicate with the clients. Further tables and servlets 
extend the modular system to execute customised appli- 
cations, platform independent services, workflow sub- 
mission, and status information of the system itself. All 
tables' structure is self-explanatory. The data is processed 
and combined by SQL statements from the servlets. 
The XML workflow descriptions and the database tables 
provide generic interfaces to couple further applications, 
such as e.g. optimisation libraries with the system. 



The engine distinguishes job execution from data 
transmission. Client applications only work on data 
in local directories while the system is responsible to 
transfer the data. Testing and debugging of complex 
workflows benefit from the fact that all intermediate data 
is accessible in the client's working directory. Workflow 
chains can be tested step-by-step as job wrappers for 
applications can be started manually using their main ( ) 
method. Data transfer can be substituted by alternative 
methods, e.g. network file systems or distributed peer- 
to-peer networks. 

The development of the engine was motivated by ac- 
tual requirements of engineering and scientific comput- 
ing. We have demonstrated how to accomplish design- 
of-experiment studies and parameter optimisations in 
finite element applications. We want to stress that pre- 
and post-processing and data visualisation routines can 
easily be integrated in the workflow chains. The hi- 
erarchical description allows to extend and recombine 
existing workflows. The data management is extremely 
simplified as all application parameters and results are 
available in the database tables of the engine. 

Currently, we are working on a consolidated version 
of the system that we plan to release as open source 
project. Besides a consistent renaming of the classes and 
a complete refactoring of the code, we aim to integrate 
a database abstraction layer to simplify the integration 
of client applications, e.g. Hibernate/ JPA [47]. Hibernate 
encapsulates database-specific implementations and cre- 
ates table schemes directly from job wrapper entities. In 
addition, we are developing a web interface to control 
the engine by browser. We plan to enhance data trans- 
fer for cluster computing, where clients are mutually 
connected and data transfer via a central server is an 
redundant detour. 
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